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(54) Multiple port register file with interleaved write ports 



(57) A high speed register file is provided for use 
with Very Long Word Instruction (VLIW) and N-way su- 
perscaler processors. The high speed register file in- 
cludes a selected number of copies of a general purpose 
register (GPR) building block. The GPR buitefing block 



includes at least two interleaved sub-banks of registers. 
Each of the sub-banks include^ a number N of write 

ports and a number M of read ports. The sub-banks are 
interleaved by write ports and have non-interleaved 
read ports. 
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Description 

Field of the Invention 

The present invention relates to a random access 
read/write mennory (RAM) device with multiple read and 
write ports or a multiple port register file, and more par- 
ticularly to a high speed register file including 64 or more 
ports and adapted for use with Very Long Word Instruc- 
tion (VLIW) and N-way superscalar processors. 

Description of the Prior Art 

Known multiple or multi-port register files are inad- 
equate for use with Very Long Word Instruction (VLIW) 
and N-way superscater processors. VLIW processors 
consist of a series of arithmetic logic or functional units 
that can execute parts of instructions simultaneously. 
VLIW and N-way superscaler processors with 8, 16, or 
more pipeline functional units require a central register 
file for communication that must provide three or four 
ports per pipeline for perhaps a 64-port or more require- 
ment. Current technology simply does not allow con- 
struction of a direct physical implementation of so many 
ports, especially with very stringent access and write 
time requirements. Currently the most ports physically 
wireable may be less than or equal to twenty (20). 

Summary of the Invention 

A principal object of the present invention is to pro- 
vide an improved multiple port register file for use with 
Very Long Word Instruction (VLIW) and N-way super- 
sealer processors. 

In brief, a high speed register file is provided for use 
with Very Long Word Instruction (VLIW) and N-way su- 
perscaler processors. The high speed register file in- 
cludes a selected number of copies of a general purpose 
register (GPR) building block. The GPR building block 
includes at least two interleaved sub-banks of registers. 
Each of the sub-banks Includes a number N of write 
ports and a number M of read ports. The sub-banks are 
Interleaved by write ports and have non-interleaved 
read ports. 

Brief Description of the Drawings 

The present invention together with the above and 
other objects and advantages may best be understood 
from the following detailed description of the preferred 
embodiments of the invention illustrated in the drawings, 
wherein: 

FIG. 1 is a block diagram illustrating a pair of twenty 
port general purpose register (GPR) file physical 
building blocks in accordance with the invention; 

FIG. 2 is a block diagram illustrating a sixty-four port 



2 

register file including four copies of the twenty-eight 
port logical GPR file bgical building bbck of FIG. 1 ; 

FIG . 3 is a block diagram illustrating a write port par- 
5 titioning arrangement utilizing the GPR file building 
blocks of FIG. l; 

FIG. 4 is a block diagram illustrating an alternative 
reduced write port partitioning register file arrange- 
ment of the invention utilizing the GPR file building 
blocks of FIG. 1; 

FIG. 5 is a block diagram illustrating a Very Long 
Word Instructran (VLIW) processor unit including 
the sixty-four port register file of FIG, 4; 

FIG. 6 is a block diagram illustrating a seventy-two 
port register file including four copies of the twenty 
eight port register file building bkx:k of FIG. 1; 

FIG. 7 is a schematic diagram illustrating a cell of 
the registerfile building block of FIG. 1 together with 
an address decode and an arithmetic and logic unit 
(ALU) adder; 

FIG. 8 is a schematic diagram illustra'ing a variable 
performance read access in accordance with the in- 
vention of the GPR file building block of FIG. 1; 

FIG. 9 is an exemplary layout of the GPR file build- 
ing blocks of FIG. 1 bit 0; 

FIG. 10 is another exemplary layout of the GPR file 
building blocks of FIG. 1 bit 0 illustrating an alterna- 
tive read port layout in accordance with the inven- 
tion; and 

FIG. 11 is a schematic diagram illustrating an ex- 
emplary layout of a processor unit including the 
GPR file building blocks of FIG. 1 . 

Detailed Description of the Preferred Embodiments 

Having reference now to the drawings, FIG. 1 illus- 
trates a twenty-eight port general purpose register 
(GPR) file logical building block in accordance with the 
invention generally designated by the reference charac- 
ter 50. In accordance with a feature of the invention, in- 
terleaving of write ports, specifically into even and odd 
sub-banks of registers as seen by write ports is em- 
ployed. The GPR register file building block 50 of depth 
32 yields a total of 1 6 available write ports and 1 2 avail- 
able read ports. 

GPR file logical building block 50 includes a pair of 
twenty port GPR file physical building blocks 52. The 
twenty port GPR file physical building blocks 52 include 
interleaved write ports of an even GPR sub-bank 52 and 
an odd GPR sub-bank 52. The GPR sub-banks 52 pro- 
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vides 12 read ports and 8 write ports with one sub-bank 
of 32 even registers and 8 write ports with another sub- 
bank of 32 odd registers. The GPR register file building 
block 50 Includes sixteen physical write ports and twelve 
physical read ports that are used to construct a 64-porl 
GPR logical register file generally designated by 60 in 
FIG. 2 with excellent performance characteristk:s. 

Referring now to FIG. 2, the sixty-four (64) port gen- 
eral purpose register file 60 Includes four copies of the 
twenty-eight port register file logical building block 50 of 
FIG. 1 - With the GPR file logical building block 50 copied 
four times, a 4-fold increase in read ports is provided for 
a required number of 48 total read ports. For the 
64-deep register file 60. the 64 registers are divided into 
two groups; one group of 32 even and one group of 32 
odd-numbered registers. In use, this register file ar- 
rangement 60 allows the compiler to generate basic 
blocks of Instructons that have an approximately equal 
number of odd and even-numbered target register op- 
erands through some combination of register allocation 
and renaming algorithm. In the case of tightly encoded 
VLIW instructions, each instruction parcel has either an 
even (even parcel) or odd target operand register file 
port pernnanently assigned, obviating the need for the 
last bit of the register address to be encoded in the in- 
struction. 

Write performance is improved in the interleave reg- 
ister file 60 since only an 8-way write selection is re- 
quired, as compared with a 16-way write selection in an 
uninterleaved register file, before a write is performed 
in the target register. However, no increase in write ports 
is possible from copying since ail copies receive all 
writes to remain valid copies. The GPR 64-port register 
file 60 also improves the actual read port performance 
of the register file since it places the copy 1 , 2, 3 and 4 
closest to the pipeline data flow It drives. A disadvantage 
of the copy strategy is that target operands must phys- 
ically traverse the entire area of the four copies of GPR 
logical building blocks 50 in the register file 60 of FIG. 2. 

FIG. 3 is a block diagram illustrating a write port par- 
titioning register file arrangement generally designated 
by 66 utilizing the four copies of the GPR building block 
60. Each block 70 includes two GPR building blocks 52 
with eight additional write paths or ports into each cell 
In the register file 68, any load can write all copies 1 , 2. 
3 and 4 simultaneously and any ALU can write all copies 
1 , 2, 3 and 4 simultaneously. The arrangement 68 is im- 
practical due to the required eight additional write ports 
required to provide 16 ALU write ports and 8 load write 
ports. 

FIG. 4 illustrates an alternative reduced write port 
partitioning register file arrangement generally desig- 
nated by 79 including four blocks 80. Each block 80 in- 
cludes two 1/2 blocks or sub-banks 52 or one block 50. 
In copies 1 and 2. one of the sub-banks 52 is used for 
ALU writes pipes 0-7 and the other sub-bank 52 is used 
for 8 load write ports. In copies 3 and 4. one of the sub- 
banks 52 is used for ALU writes pipes 8-1 5 and the other 



sub-bank 52 is used for 8 load write ports. The partition- 
ing arrangement 78 of FIG. 4 provides a practrcal solu- 
tion requiring a total of sixteen write ports, rather than 
the impractical arrangement 68 of FIG. 3 that requires 
5 a total of 24 write ports. A sophisticated compiler can 
limit target writes only to two of the four copies 80 in a 
given cycle. For ALU operations producing target results 
that must be written to the register file 78, the compiler 
arranges to have all or most of the work of a basic block 
10 confined to either the first eight pipelines 0-7 or the sec- 
ond eight pipelines 8-1 5 of the sixteen total pipeline slots 
or pipes in a VLIW processor. 

Each group of two copies 80 of the register file 78 
need only receive the ALU target results from one half 
IS of the sixteen pipes; either eight pipes 0-7 or eight pipes 
8-15, reducing the total number of write ports required 
by eight per copy. Additional read ports are not particu- 
larly expensive; however, additional global write ports 
are very expensive since they impact all copies 1 , 2, 3 
and 4 with extra input ports. When pipes 8-15 need to 
look at a result from pipes 0-7 and vice versa, the com- 
piler schedules move register operations and schedules 
register references appropriately so that pipe stalls for 
this reason are totally avoided. 

FIG. 5 is a block diagram illustrating a VLIW proc- 
essor unit generally designated by 82 including the six- 
ty-four port GPR file 78. Another significant benefit from 
the ALU target partitk>ning scheme illustrated in FIG. 5 
is a substantial reduction in the total bus wire length re- 
quired to get any ALU result to any write port, as well as 
reduced loading requirements, two versus, four without 
the write port partitioning. A line labeled 84 represents 
a longest ALU write bus wire required with the parti- 
tioned register file 76. A dotted line labeled 86 repre- 
sents a longest ALU write bus wire required with the 
non-partitioned register file 60. In the partitioned ALU 
target register file 78. at worst, an ALU need only drive 
to its own and an adjacent register file copy as opposed 
to driving to its next adjacent copy in the non-partitioned 
arrangement. This also potentially albws these bus 
wires to be double thickness and double width wires. 
The thicker and wider wires have bwer resistance to 
provide better performance and also because the bus 
wires may run only about one-half length as compared 
to a non-partitioned register file arrangement. Write port 
target partitioning is explicitly not used for cache data 
load write ports. This is so partly because in commercial 
environments, performance is often limited by the 
number of load pipes available, not the number of ALUs. 
Thus, ALU operations can be duplicated in each ALU 
target partition with little interference, loads cannot. Al- 
so, many load dependency cases exist, for example, 
linked lists, so that latency is quite important. There :c 
almost no cycle time/wire delay difference for the cache 
data load case because the data busses are already 
long wires with high-current drivers. Driving four register 
file copies versus only two causes an insignificant 
change in bus delay, especially given that toad paths are 
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already two cycle paths. 

FIG. 6 Illustrates a seventy-two port register file 
generally designated by 90 including four copies of the 
twenty eight port GPR file logical building block 50 of 
FIG. 1. In FIG. 6, the completed register file layout in- s 
eludes eight of the basic 20-port GPR sub-banks 52 
yielding a logical register file capable of a maximum of 
4S reads and 24 writes per cycle for a total of 72 logical 
ports with very fast read access and write-through per- 
formance capability. io 

FIG. 7 is a schematic diagram illustrating a cell of 
the register file building block 52 of FIG. 1. A DCACHE 
data input is applied to an 8-way byte align 100 that is 
connected at its output to a 4-way multiplexer (MUX) 
102. MUX 102 is connected to a 2-way multiplexer is 
(MUX) 104 that together define an 6-way MUX function 
included in a cell 106. Cell 106 includes a pair of latches, 
shown as L1 and L2 latches or a shift register latch. The 
LI latch is connected to a 8-way MUX 108 that Is con- 
nected at its output to a 8-way MUX 110. A GPR address 
decoder generally designated by 112 including a pair of 
address buffers 114. 116 and address LI latches 118, 
120 applies address signals to the MUXs 108 and 110. 
A C 1 clock output Is used to set the address buffers 1 1 4, 
116 for the Li and L2 latches 118 and 120. The output 25 
of MUX 1 10 is applied to an adder 1 22 that is connected 
to an L2* latch 1 24 of an associated arithmetic logic unit 
(ALU). The output of the L2* latch 124 is applied to a 
4-way MUX 1 26 that is connected to the 2-way MUX 1 04 
via a global network wire labeled 1 28. In a complemen- 30 
tary metal oxide semiconductor (CMOS) integrated cir- 
cuit implementation with six levels of metallization, the 
global network wire 128 can be fifth level of metal and 
made thicker, and for example, twice the standard wiring 
width. The cell output of L2 latch 106 is applied to a 35 
64-way MUX 130 that provides a restore path. 

FIG. 8 illustrates a variable read port performance 
feature of the invention. For each register file copy 50, 
not all the twelve read ports have the same performance 
requirements. Eight of the read ports are required to be 40 
write through and as fast as possible. Four of the fast 
read ports feed the right input of the ALU and as such 
must have the true and complement function for subtract 
and other functions. The other four fast read ports feed 
the other ALU input (not complemented) and the <s 
CACHE togic (machine critical path) and also are made 
as fast as possible. An LI latch 142 provides two latch 
phases, -PHASE and +PHASE. The LI latch +PHASE 
is used to drive a plurality of read port selectors 1 44 and 
146, provkJIng fast LI write through ports. The write so 
through ports pass data from a write port to the read port 
in the same clock cycle. Also the compiler can guaran- 
tee^4hHt.RO'mcH[C than t\w output read selectors will ac- 
cess the same register on any given cycle, limiting the 
current that must be provided and improving perform- ss 
ance further. The LI latch -PHASE is used to drive a 
buffer 148 that is coupled to a pair of selects 150 and 
152, providing the four sbw 12 read ports that are non*. 
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write though and do not have stringent performance re- 
quirements as they are used for either store register da- 
ta access or to load exceptk>n save restore registers, 
both requiring noncritical timings. The four slow read 
ports can tolerate longer, thinner, and more irregular 
wire bussing which gives the designer an additional de- 
gree of freedom. A bypass select 154 coupled to the 
write port provides fast read ports to the ALU pipes. 

FIG. 9 provides an exemplary layout of the GPR file 
logical building block 50 of FIG. 1 bit 0. Near the middle 
of the layout, 64 GPR register cells are shown divided 
into 32 odd cells 1 06 and 32 even cells 1 06. An adjacent 
column of 8:1 multiplexers defining 4-way and 2-way 
MUXs 102, 126 and 104 is connected to the even and 
odd cells 106 by a plurality of write lines indicated by 
example lines labeled 160. Eight LI read ports and four 
L2 read ports are illustrated with lines indicating exam- 
ple connection of the cells 106 to the read ports. 

Referring to FIG. 10, another exemplary layout of 
the GPR file logical building block 50 of FIG. 1 bit 0 il- 
lustrates an altemative read port layout in accordance 
with the invention. As depicted and described with re- 
spect to FIG. 8, GPR file logical building block 50 in- 
cludes eight fast LI read ports and four sk)w L2 read 
ports. Four fast LI read ports are provided between the 
write decode, clocking, buffering and test and a first four 
eight-way write MUXs. Four fast L1 read ports are pro- 
vided between a second four eight-way write MUXs and 
the four slow L2 read ports. Exemplary connecting lines 
are shown between eight write port inputs and the first 
four eight-way MUXs, these eight-way MUXs and the 
ceils and the cells and one of the four slow L2 read ports. 

FtG. 11 is a schematic diagram illustrating an ex- 
emplary layout of a processor unit including the GPR file 
78. In FIG. 1 1 , the partitioned GPR file 78 is shown gen- 
erally centrally located above the register/register (RR) 
pipes 0-7 and below the register/storage (RS) pipes 0-5 
and an associated DECACHE directory (DDIR) and 
segment look-aside buffers (SLBs) 0-7. An exemplary 
wire is illustrated between the even and odd GPR 1/2 
copies 1 (sub-banks 52 in FIG. 1) and a decode and 
control area that can be provided with thicker wire, for 
example, two times standard width. 



Claims 

1. A high speed register file for use with Very Long 
Word Instruction (VLIW) and N-way superscalar 
processors comprising: 

a general purpose register (GPR) building 
block, said GPR building block including at- 
least two interleaved sub-banks of registers, 
each of said sub-banks including a number N 
of write ports and a number M of read ports; 
sakj sub-banks being interleaved by write ports 
and having non-interleaved read ports; and 
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6. 



a selected number of copies of said GPR build- 
ing block. 

A high speed register file as recited in claim 1 
wherein said general purpose register (GPR) build- 
ing block includes an odd sub-bank of registers in- 
cluding said number N of odd write ports and an 
even sub-bank of registers Including saki number 
N of even write ports. 

A high speed register file as recited In claim 1 
wherein said selected number of saki copies are 
combined into predetermined groups. 

A high speed register file as recited In claim 3 
wherein the VLIW processor includes sixteen arith- 
metic and logic units (ALUs) 0-15 and wherein ei- 
ther a first eight ALUs 0-7 or a second eight ALUs 
8*15 writes to predefined ones of said predeter- 
mined groups; and 

wherein said selected number of said copies 
equals four and said first eight ALUs 0-7 writes to 
two of sard copies and said second eight ALUs 8-15 
writes to the other two of sakj four copies. 

A high speed register file as recited in claim 1 
wherein said M read ports ot said at least two inter- 
leaved sub-banks of said registers defining said 
GPR building block are Ored together, providing M 
read ports for sakJ GPR building block. 

A high speed register file as recited in claim 1 
wherein said general purpose register (GPR) build- 
ing block includes said number M of read ports and 
wherein said selected number of said copies equals 
four, providing a total of four times M read ports; and 
wherein said number M of read ports equals 
12, providing a total of 46 read ports. 

A high speed register file as recited in claim 1 
wherein said general purpose register (GPR) build- 
ing block includes two times sakj number N of write 
ports. 

A high speed register file as recited in claim 1 
wherein saki selected number of copies equals four, 
copy 1 . copy 2. copy 3 and copy 4. and wherein the 
VLIW processor includes sixteen arithmetic and 
logic units (ALUs) 0-15 and wherein a first eight 
ALUs 0-7 write to copies 1 and 2 and wherein a sec- 
ond eight ALUs 8-1 5 write to copies 3 and 4; and 

wherein said four copies copy 1, copy 2, copy 
3 and copy 4 are coupled to a CACHE memory for 
providing CACHE load writes to all of said four cop- 
ies. 
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lines 0-15 and wherein said selected number of 
copies equals four, copy 1 , copy 2, copy 3 and copy 
4; and wherein said copies 1 and 2 are connected 
to said pipelines 0-7 and sard copies 3 and 4 are 
connected to said pipelines 8-15; and 

wherein each of said copies are connected to 
said pipelines by eight write ports. 

1 0. A high speed register file as recited in anyone of the 
previous claims wherein said number U of said read 
ports include both write through ports and non-write 
through ports. 



A high speed register file as recited in claim 1 
wherein the VLIW processor Includes sixteen pipe- 
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(54) Multiple port register file with interleaved write ports 



(57) A high speed register file is provided for use 
with Very Long Word Instruction (VLIW) and N-way su- 
perscaler processors. The high speed register file in- 
cludes a selected number of copies of a general purpose 
register (GPR) building block. The GPR building block 



includes at least two interleaved sub-banks of registers. 
Each of the sub-banks includes a number N of write 
ports and a nunnber M of read ports. The sub-banks are 
interleaved by write ports and have non-interleaved 
read ports. 



8 WRITE PORTS 
EVEN REGS 

oj \ \ 7 



8 WRITE PORTS 
ODD REGS 



CO 

< 

CO 
CO 
O) 

o 

LU 



32 EVEN REGS 
52 



Ot 



1; 

READ 



32 ODD REGS 
52 



:5r 



POf'TS jQj^L 16 WRrrE ♦ 12 READ PORTS 



11 



FIG.1 



Printed by Jouve, 7500 1 PARIS (FR) 



EP 0 745 933 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



, Application Number 

EP 96 48 0056 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citabon ot ctocument with indication, where appropriate. 
01 relevant passages 



Relevant 
todaim 



CLASSIFICATION OF THE 
APPLICATK)N OnLCLB) 



EP 0 325 310 A (PHILIPS NV) 
26 July 1989 (1989-07-26) 

* column 7, line 51 - column 8, line 32 ♦ 

US 5 392 411 A (OZAKI SHINJI) 
21 February 1995 (1995-02-21) 

* column 4, line 53 - column 7, line 46; 
figures 2,3 * 

PATENT ABSTRACTS OF JAPAN 
vol. 017, no. 692 (P-1663), 
17 December 1993 (1993-12-17) 
& JP 05 233281 A (TOSHIBA CORP), 
10 September 1993 (1993-09-10) 

* abstract * 



1,8 



1,2.5,7 



1.3.4.9 



606F9/30 
G06F9/38 



TECHNICAL RELOS 
SEARCHED (Ifit.CU) 



G06F 



Ttie present search report has been drawn up for all claims 



PiaoaotHMieh 

THE HAGUE 



Osu 01 somptotion ot «i« aoarcn 

30 November 2000 



Ejcaminsr 

Daskalakis, T 



CATEGORY OF CfTED DOCUMENTS 

X : panioAarty relevant B taken alone 

Y : panculany relevant « oomlMned anoiner 

documerri ol the same calegwy 
A : lochnolDgica) tacKground 
0 : non-wrlten disclosure 
P:inienned( 



T : theory or principle undertying the nvention 
E : earlier patent Uocuinefil. but pubtislied on. or 

alter trie nmg dale 
0 : document ciieo in the application 
L : document dtod for oltier reasons 

A : cnemt>er of me same patent lamily. corresponding 
document 



2 



EP 0 745 933 A3 



ANNEX TO THE EUROPEAN SEARCH REPORT 

ON EUROPEAN PATENT APPLICATION NO. EP 96 48 0056 



This annex lists the patent lamiiy memt}ersrelating to ttie patent documents cited in the above-mentioned European search report. 
The members are as contained in tne European Patent Office EDP fiie on 

The European Patent Office is in no way liable for these particulars which are merely given for the purpose of information. 

30-11-2000 



Patent document 




Publication 




Patent family 


Publication 


cited in search report 




date 




member(s) 


date 


EP 0325310 


A 


26-07-1989 . 


NL 


8800053 A 


01-08-1989 








DE 


68909425 D 


04-11-1993 








OE 


68909425 T 


07-04-1994 








ES 


2047103 T 


16-02-1994 








FI 


890066 A,B, 


12-07-1989 








HK 


2029S A 


24-02-1995 








JP 


1217575 A 


31-08-1989 








US 


5692139 A 


25-11-1997 








US 


5103311 A 


07-04-1992 


US 5392411 


A 


21-02-1995 


JP 


2823767. B 


11-11-1998 








JP 


5282147 A 


29-10-1993 


JP 05233281 


A 


10-09-1993 


US 


5530817 A 


25-06-1996 



9 

I —■■-^ ■ ■ • 

§ . 

o 

ui For more details about this annex : see Official Journal of the European Patent Office, No. 12/82 



3 



